117 research outputs found

    Network Lasso: Clustering and Optimization in Large Graphs

    Full text link
    Convex optimization is an essential tool for modern data analysis, as it provides a framework to formulate and solve many problems in machine learning and data mining. However, general convex optimization solvers do not scale well, and scalable solvers are often specialized to only work on a narrow class of problems. Therefore, there is a need for simple, scalable algorithms that can solve many common optimization problems. In this paper, we introduce the \emph{network lasso}, a generalization of the group lasso to a network setting that allows for simultaneous clustering and optimization on graphs. We develop an algorithm based on the Alternating Direction Method of Multipliers (ADMM) to solve this problem in a distributed and scalable manner, which allows for guaranteed global convergence even on large graphs. We also examine a non-convex extension of this approach. We then demonstrate that many types of problems can be expressed in our framework. We focus on three in particular - binary classification, predicting housing prices, and event detection in time series data - comparing the network lasso to baseline approaches and showing that it is both a fast and accurate method of solving large optimization problems

    Optimized parameter search for large datasets of the regularization parameter and feature selection for ridge regression

    Get PDF
    In this paper we propose mathematical optimizations to select the optimal regularization parameter for ridge regression using cross-validation. The resulting algorithm is suited for large datasets and the computational cost does not depend on the size of the training set. We extend this algorithm to forward or backward feature selection in which the optimal regularization parameter is selected for each possible feature set. These feature selection algorithms yield solutions with a sparse weight matrix using a quadratic cost on the norm of the weights. A naive approach to optimizing the ridge regression parameter has a computational complexity of the order with the number of applied regularization parameters, the number of folds in the validation set, the number of input features and the number of data samples in the training set. Our implementation has a computational complexity of the order . This computational cost is smaller than that of regression without regularization for large datasets and is independent of the number of applied regularization parameters and the size of the training set. Combined with a feature selection algorithm the algorithm is of complexity and for forward and backward feature selection respectively, with the number of selected features and the number of removed features. This is an order faster than and for the naive implementation, with for large datasets. To show the performance and reduction in computational cost, we apply this technique to train recurrent neural networks using the reservoir computing approach, windowed ridge regression, least-squares support vector machines (LS-SVMs) in primal space using the fixed-size LS-SVM approximation and extreme learning machines

    Generalized methods and solvers for noise removal from piecewise constant signals. II. New methods

    Get PDF
    Removing noise from signals which are piecewise constant (PWC) is a challenging signal processing problem that arises in many practical scientific and engineering contexts. In the first paper (part I) of this series of two, we presented background theory building on results from the image processing community to show that the majority of these algorithms, and more proposed in the wider literature, are each associated with a special case of a generalized functional, that, when minimized, solves the PWC denoising problem. It shows how the minimizer can be obtained by a range of computational solver algorithms. In this second paper (part II), using this understanding developed in part I, we introduce several novel PWC denoising methods, which, for example, combine the global behaviour of mean shift clustering with the local smoothing of total variation diffusion, and show example solver algorithms for these new methods. Comparisons between these methods are performed on synthetic and real signals, revealing that our new methods have a useful role to play. Finally, overlaps between the generalized methods of these two papers and others such as wavelet shrinkage, hidden Markov models, and piecewise smooth filtering are touched on

    Generalized methods and solvers for noise removal from piecewise constant signals. I. Background theory

    Get PDF
    Removing noise from piecewise constant (PWC) signals is a challenging signal processing problem arising in many practical contexts. For example, in exploration geosciences, noisy drill hole records need to be separated into stratigraphic zones, and in biophysics, jumps between molecular dwell states have to be extracted from noisy fluorescence microscopy signals. Many PWC denoising methods exist, including total variation regularization, mean shift clustering, stepwise jump placement, running medians, convex clustering shrinkage and bilateral filtering; conventional linear signal processing methods are fundamentally unsuited. This paper (part I, the first of two) shows that most of these methods are associated with a special case of a generalized functional, minimized to achieve PWC denoising. The minimizer can be obtained by diverse solver algorithms, including stepwise jump placement, convex programming, finite differences, iterated running medians, least angle regression, regularization path following and coordinate descent. In the second paper, part II, we introduce novel PWC denoising methods, and comparisons between these methods performed on synthetic and real signals, showing that the new understanding of the problem gained in part I leads to new methods that have a useful role to play

    Explaining Support Vector Machines: A Color Based Nomogram.

    Get PDF
    PROBLEM SETTING: Support vector machines (SVMs) are very popular tools for classification, regression and other problems. Due to the large choice of kernels they can be applied with, a large variety of data can be analysed using these tools. Machine learning thanks its popularity to the good performance of the resulting models. However, interpreting the models is far from obvious, especially when non-linear kernels are used. Hence, the methods are used as black boxes. As a consequence, the use of SVMs is less supported in areas where interpretability is important and where people are held responsible for the decisions made by models. OBJECTIVE: In this work, we investigate whether SVMs using linear, polynomial and RBF kernels can be explained such that interpretations for model-based decisions can be provided. We further indicate when SVMs can be explained and in which situations interpretation of SVMs is (hitherto) not possible. Here, explainability is defined as the ability to produce the final decision based on a sum of contributions which depend on one single or at most two input variables. RESULTS: Our experiments on simulated and real-life data show that explainability of an SVM depends on the chosen parameter values (degree of polynomial kernel, width of RBF kernel and regularization constant). When several combinations of parameter values yield the same cross-validation performance, combinations with a lower polynomial degree or a larger kernel width have a higher chance of being explainable. CONCLUSIONS: This work summarizes SVM classifiers obtained with linear, polynomial and RBF kernels in a single plot. Linear and polynomial kernels up to the second degree are represented exactly. For other kernels an indication of the reliability of the approximation is presented. The complete methodology is available as an R package and two apps and a movie are provided to illustrate the possibilities offered by the method

    A Mathematical Model for Interpretable Clinical Decision Support with Applications in Gynecology

    Get PDF
    Over time, methods for the development of clinical decision support (CDS) systems have evolved from interpretable and easy-to-use scoring systems to very complex and non-interpretable mathematical models. In order to accomplish effective decision support, CDS systems should provide information on how the model arrives at a certain decision. To address the issue of incompatibility between performance, interpretability and applicability of CDS systems, this paper proposes an innovative model structure, automatically leading to interpretable and easily applicable models. The resulting models can be used to guide clinicians when deciding upon the appropriate treatment, estimating patient-specific risks and to improve communication with patients.We propose the interval coded scoring (ICS) system, which imposes that the effect of each variable on the estimated risk is constant within consecutive intervals. The number and position of the intervals are automatically obtained by solving an optimization problem, which additionally performs variable selection. The resulting model can be visualised by means of appealing scoring tables and color bars. ICS models can be used within software packages, in smartphone applications, or on paper, which is particularly useful for bedside medicine and home-monitoring. The ICS approach is illustrated on two gynecological problems: diagnosis of malignancy of ovarian tumors using a dataset containing 3,511 patients, and prediction of first trimester viability of pregnancies using a dataset of 1,435 women. Comparison of the performance of the ICS approach with a range of prediction models proposed in the literature illustrates the ability of ICS to combine optimal performance with the interpretability of simple scoring systems.The ICS approach can improve patient-clinician communication and will provide additional insights in the importance and influence of available variables. Future challenges include extensions of the proposed methodology towards automated detection of interaction effects, multi-class decision support systems, prognosis and high-dimensional data
    • …
    corecore